Skip to content

Conversation

@olivialynn
Copy link
Member

The second of three planned PRs for LSDB's #449.

Overall plan:

A little more detail:

  1. Set up stage:
    1. We initialize the ResumePlan to have a concept of threshold_mode. The threshold mode defaults to row_count, but is set to mem_size if byte_pixel_threshold is in the input args (and is not None).
    2. We also specify some paths in ResumePlan: MEM_SIZE_HISTOGRAM_BINARY_FILE and MEM_SIZE_HISTOGRAMS_DIR.
    3. We run ResumePlan's gather_plan, which creates histogram directory(/ies) among other set up stuff.
  2. Mapping stage:
    1. Here's where we map input files to Healpix pixels (via the call to map_reduce's map_to_pixels); and in doing so, we create the histogram. I am electing to make two histograms here, so long as the histogram mode is set to mem_size. Is is ok to just call mr.map_to_pixels a second time like that?
    2. We add memory size calculating method _get_mem_size_of_chunk and its two helpers _get_row_mem_size_data_frame and _get_row_mem_size_pa_table
  3. Binning stage:
    1. No changes for now, except that we add an explicit parameter which_histogram to read_histogram to show that we're reading the row_count histogram. It's the default, but I wanted to include it for readability/safety.

@olivialynn olivialynn closed this Nov 5, 2025
@olivialynn olivialynn reopened this Nov 5, 2025
@olivialynn olivialynn closed this Nov 5, 2025
@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 45.00000% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.03%. Comparing base (9754d78) to head (3da9e43).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/hats_import/catalog/map_reduce.py 30.43% 32 Missing ⚠️
src/hats_import/catalog/resume_plan.py 57.50% 17 Missing ⚠️
src/hats_import/catalog/run_import.py 33.33% 4 Missing ⚠️
src/hats_import/catalog/arguments.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #618      +/-   ##
==========================================
- Coverage   92.64%   90.03%   -2.61%     
==========================================
  Files          32       32              
  Lines        1916     2007      +91     
==========================================
+ Hits         1775     1807      +32     
- Misses        141      200      +59     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants